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Abstract. This paper studies precision matrix estimation for multiple related 
Gaussian graphical models from a dataset consisting of different classes, based 
upon the formulation of this problem as group graphical lasso. In particular, this 
paper proposes a novel hybrid covariance thresholding algorithm that can ef¬ 
fectively identify zero entries in the precision matrices and split a large joint 
graphical lasso problem into many small subproblems. Our hybrid covariance 
thresholding method is superior to existing uniform thresholding methods in that 
our method can split the precision matrix of each individual class using differ¬ 
ent partition schemes and thus, split group graphical lasso into much smaller 
subproblems, each of which can be solved very fast. This paper also establishes 
necessary and sufficient conditions for our hybrid covariance thresholding algo¬ 
rithm. Experimental results on both synthetic and real data validate the superior 
performance of our thresholding method over the others. 


1 Introduction 

Graphs have been widely used to describe the relationship between variables (or fea¬ 
tures). Estimating an undirected graphical model from a dataset has been extensively 
studied. When the dataset has a Gaussian distribution, the problem is equivalent to es¬ 
timating a precision matrix from the empirical (or sample) covariance matrix. In many 
real-world applications, the precision matrix is sparse. This problem can be formulated 
as graphical lasso EEa and many algorithms PI [Tbl 151 [19] [TS] have been proposed 
to solve it. To take advantage of the sparsity of the precision matrix, some covariance 
thresholding (also called screening) methods are developed to detect zero entries in the 
matrix and then split the matrix into smaller submatrices, which can significantly speed 
up the process of estimating the entire precision matrix imiiii. 

Recently, there are a few studies on how to jointly estimate multiple related graphi¬ 
cal models from a dataset with a few distinct class labels El laiTiiiiiiiKniiiiEa EH 
|20l|23l. The underlying reason for joint estimation is that the graphs of these classes 
are similar to some degree, so it can increase statistical power and estimation accuracy 
by aggregating data of different classes. This joint graph estimation problem can be for¬ 
mulated as joint graphical lasso that makes use of similarity of the underlying graphs. 
In addition to group graphical lasso, Guo et al. used a non-convex hierarchical penalty 


to promote similar patterns among multiple graphical models ||6l ; IS introduced popu¬ 
lar group and fused graphical lasso; and ll^l20l proposed efficient algorithms to solve 
fused graphical lasso. To model gene networks, lfT4ll proposed a node-based penalty to 
promote hub structure in a graph. 

Existing algorithms for solving joint graphical lasso do not scale well with respect 
to the number of classes, denoted as K, and the number of variables, denoted as p. 
Similar to covariance thresholding methods for graphical lasso, a couple of threshold¬ 
ing methods 1251 l20l are developed to split a large joint graphical lasso problem into 
subproblems 13 . Nevertheless, these algorithms all use uniform thresholding to decom¬ 
pose the precision matrices of distinct classes in exactly the same way. As such, it may 
not split the precision matrices into small enough submatrices especially when there are 
a large number of classes and/or the precision matrices have different sparsity patterns. 
Therefore, the speedup effect of covariance thresholding may not be very significant. 

In contrast to the above-mentioned uniform covariance thresholding, this paper 
presents a novel hybrid (or non-uniform) thresholding approach that can divide the 
precision matrix for each individual class into smaller submatrices without requiring 
that the resultant partition schemes be exactly the same across all the classes. Using 
this method, we can split a large joint graphical lasso problem into much smaller sub¬ 
problems. Then we employ the popular ADMM (Alternating Direction Method of Mul¬ 
tipliers IJl Q) method to solve joint graphical lasso based upon this hybrid partition 
scheme. Experiments show that our method can solve group graphical lasso much more 
efficiently than uniform thresholding. 

This hybrid thresholding approach is derived based upon group graphical lasso. The 
idea can also be generalized to other joint graphical lasso such as fused graphical lasso. 
Due to space limit, the proofs of some of the theorems in the paper are presented in 
supplementary material. 

2 Notation and Definition 

In this paper, we use a script letter, like H, to denote a set or a set partition. When H 
is a set, we use Hi to denote the z* element. Similarly we use a bold letter, like H to 
denote a graph, a vector or a matrix. When ff is a matrix we use j to denote its 
entry. We use and {H^^\ X.o denote 

N objects of same category. 

Let denote a sample dataset of K classes and the data 

in (1 < fc < K) are independently and identically drawn from a p-dimension 
normal distribution Let and denote the empirical covariance 

and (optimal) precision matrices of class k, respectively. By “optimal” we mean the 
precision matrices are obtained by exactly solving joint graphical lasso. Let a binary 
matrix denote the sparsity pattern of i.e., for any i,j{l <i,j < p), e[^^ = 
1 if and only if Y 0- 

Set partition. A set T/ is a partition of a set C when the following conditions are 
satisfied: 1) any element in T/ is a subset of C; 2) the union of all the elements in H is 
equal to C; and 3) any two elements in H are disjoint. Given two partitions H and T of 
a set C, we say that H is finer than E (or T/ is a refinement of E), denoted ■a&'H. <E, 


if every element in "H is a subset of some element in If "H :< F and "H 7 ^ we say 
that Ti, is strictly finer than F (or "H is a strict refinement of F), denoted asH ^ F. 

Let 0 denote a matrix describing the pairwise relationship of elements in a set C, 
where 0i j corresponds to two elements Ci and Cj. Given a partition H of C, we define 
©-u^as a \Hk\ X \Hk\ submatrix of 0 where T-Lk is an element of T-L and {©'Hk)i,j — 
for any suitable (i, j). 

Graph-based partition. Let V = {1,2,.. .,p} denote the variable (or feature) 
set of the dataset. Let graph = (V, denote the fc* estimated concentration 
graph 1 < k < K. This graph defines a partition of V, where an element in 
corresponds to a connected component in The matrix can be divided into 
disjoint submatrices based upon Let E denote the mix of E^^'>, E ^‘^'>,..., E^^\ 
i.e., one entry Eij is equal to 1 if there exists at least one fc (1 < k < K) such that 
E^j^^ is equal to 1. We can construct a partition ffl of V from graph G = {V, E}, where 
an element in ffl corresponds to a connected component in G. Obviously, ffl*^*^ ^ ffl 
holds since E^^'> is a subset of E. This implies that for any k, the matrix 0(^) can be 
divided into disjoint submatrices based upon ffl. 

Feasible partition. A partition H of V is feasible for class k or graph G^^^ if ffl ^ 
%. This implies that 1) % can be obtained by merging some elements in ffl(*); 2) each 
element in % corresponds to a union of some connected components in graph G ^^^; and 
3) we can divide the precision matrix 0 (^) into independent submatrices according to 
"H and then separately estimate the submatrices without losing accuracy. T-L is uniformly 
feasible if for all fc (1 < fc < K), ffl^^^ % holds. 

Let denote K partitions of the variable set V. If for each 

k {1 < k < K), ffl(^) holds, we say ... ,1-1^^^} is a feasi¬ 

ble partition of V for the K classes or graphs. When at least two of the K partitions 
are not same, we say is a non-uniform partition. Otherwise, 

..., } is a class-independent or uniform partition and abbreviated as 

H. That is, H is uniformly feasible if for all fc (1 < A: < K), ffl*^^) H holds. Ob¬ 
viously, {ffl^^\ ffl(^\ ..., ffl(^)} is finer than any non-uniform feasible partition of the 
K classes. Based upon the above definitions, we have the following theorem, which is 
proved in supplementary material. 

Theorem 1 For any uniformly feasible partition % of the variable set V, we have ffl ^ 
"H. That is, % is feasible for graph G and ffl is the finest uniform feasible partition. 

Proof First, for any element T-Lj in PL, G does not contain edges between PLj and 
Pi — Pij- Otherwise, since G is the mixing (or union) of all G^^\ there exists at least 
one graph G*^^^ such that it contains at least one edge between Pij and Pi — Pij. Since 
Pij is the union of some elements in ffl(^\ this implies that there exist two different 
elements in ffl*^^) such that G^^^ contains edges between them, which contradicts with 
the fact that G*^^^ does not contain edges between any two elements in fflW.Thatis, "H 
is feasible for graph G. 

Second, if ffl "H does not hold, then there is one element ffl^ in ffl and one element 
Pij in Pi such that ffl^ FPij 7 ^ 0 and ffl^ — Pij f 0. Based on the above paragraph, 
Va; € ffli n Pij and Vy € ffl^ — Pij = ffl^ n ("Hi — Pij), we have Ex,y = Ey^^ = 0. That 
is, ffli can be split into at least two disjoint subsets such that G does not contain any 



edges between them. This contradicts with the fact that corresponds to a connected 
component in graph G. 


3 Joint Graphical Lasso 

To learn the underlying graph structure of multiple classes simultaneously, some penalty 
functions are used to promote similar structural patterns among different classes, includ¬ 
ing |IT6l|3]|6ll2l[l3][H|25l|20l|2T]. A typical joint graphical lasso is formulated as the 
following optimization problem: 


K 

mm^L(0('=)) + P(0) (1) 

k^l 

Where 0(^') 0 is the precision matrix (k = 1 ,..., K) and 0 represents the set of 

0 (fe) xhe negative log-likelihood and the regularization P{0) are defined as 

follows. 

L(0W) = -logdet(0('=)) -f tr(5('=)0('=)) (2) 

K 

P(0) = Ai^||0('=)||i-f A2J(0) (3) 

fc=i 

Here Ai > 0 and A 2 > 0 and J(0) is some penalty function used to encourage 
similarity (of the structural patterns) among the K classes. In this paper, we focus on 
group graphical lasso. That is. 


J(0)=2 ^ , 


K 




k=l 


(4) 


4 Uniform Thresholding 

Covariance thresholding methods, which identify zero entries in a precision matrix be¬ 
fore directly solving the optimization problem like Eq.(l), are widely used to accelerate 
solving graphical lasso. In particular, a screening method divides the variable set into 
some disjoint groups such that when two variables (or features) are not in the same 
group, their corresponding entry in the precision matrix is guaranteed to be 0. Using 
this method, the precision matrix can be split into some submatrices, each correspond¬ 
ing to one distinct group. To achieve the best computational efficiency, we shall divide 
the variable set into as small groups as possible subject to the constraint that two related 
variables shall be in the same group. Meanwhile, El described a screening method 
for group graphical lasso. This method uses a single thresholding criterion (i.e., uni¬ 
form thresholding) for all the K classes, i.e., employs a uniformly feasible partition of 
the variable set across all the K classes. Existing methods such as those described in 
E1|25]|201 for fused graphical lasso and that in ifTSl for node-based learning all employ 
uniform thresholding. 




Uniform thresholding may not be able to divide the variable set into the finest feasi¬ 
ble partition for each individual class when the K underlying concentration graphs are 
not exactly the same. For example. Figure [TJ a) and (c) show two concentration graphs 
of two different classes. These two graphs differ in variables 1 and 6 and each graph 
can be split into two connected components. However, the mixing graph in (b) has only 
one connected component, so it cannot be split further. According to Theorem 1, no 
uniform feasible partition can divide the variable set into two disjoint groups without 
losing accuracy. It is expected that when the number of classes and variables increases, 
uniform thresholding may perform even worse. 



Fig. 1. Illustration of uniform thresholding impacted by minor structure difference between two 
classes, (a) and (c): the edge matrix and concentration graph for each of the two classes, (b): the 
concentration graph resulting from the mixing of two graphs in (a) and (c). 


5 Non-uniform Thresholding 


Non-uniform thresholding generates a non-uniform feasible partition by thresholding 
the K empirical covariance matrices separately. In a non-uniform partition, two vari¬ 
ables of the same group in one class may belong to different groups in another class. 
Figure]^ shows an example of non-uniform partition. In this example, all the matrix el¬ 
ements in white color are set to 0 by non-uniform thresholding. Except the white color, 
each of the other colors indicates one group. The 7* and 9* variables belong to the same 
group in the left matrix, but not in the right matrix. Similarly, the and 4* variables 
belong to the same group in the right matrix, but not in the left matrix. 

We now present necessary and sufficient conditions for identifying a non-uniform 
feasible partition for group graphical lasso, with penalty defined in Eq (3) and (4). 

Given a non-uniform partition ..., for the K classes, let = 

t denote the group which the variable i belongs to in the fc* class, i.e., (f) i G 























Fig. 2. Illustration of a non-uniform partition. White color indicates zero entries detected by co- 
variance thresholding. Entries with the same color other than white belong to the same group. 


We define pairwise relationship matrices (1 < fc < K) as follows: 

fiS=iS = 0; 

= 1; otherwise 

Also, we define < k < K) as follows: 

zg) = = Ai + A2 X t{{Y, |0l‘j I) = 0) (6) 

t^k 


Here r(5) is the indicator function. 

The following two theorems state the necessary and sufficient conditions of a non- 
uniform feasible partition. See supplementary material for their proofs. 

Theorem! If , V^'^\ ..., is a non-uniform feasible partition of the vari¬ 

able set V, then for any pair {i,j) (1 < * J < p) the following conditions must be 
satisfied: 


1 - Ai)i < Ai; t/Vfc e 1,2,..., = 0 

|sgg < Zg; t//(5 = 0 and3t f = 1 


(7) 


Here, each is a covariance matrix of the class and 


max(0, x). 


Theorem 3 If for any pair {i, j){l < i j < p) the following conditions hold, then 
_ 7 ?(^)| is Q non-uniform feasible partition of the variable set V. 


EtM’:} I - Ai)^ < Ai; ifWk G 1 ,2 ,..., K,I^ = 0 

|sg| < Ai; = 0 and3t ^ k,lf] = 1 


( 8 ) 




















































Algorithm 1 Hybrid Covariance Screening Algorithm 


for k = Ito K Ao 

Initialize = 1, VI < i < ji < p 

Set = 0, if 15^*' I < Ai and i ^ j 
SetfW = 0, < Ai andi 

end for 

for fc = 1 to A do 

Construct a graph for V from 

Find connected components of G*-*^ 

for V(i, j) in the same component of do 


Set = 1 

end for 
end for 
repeat 

Search for triple (a;, i, j) satisfying the following condition: 
=0,|<"^| > AiandBs, s.t.= 1 


if 3(a;, i,j) satisfies the condition above then 

merge the two components of that containing variable i and j into new component; 

for V(m, n) in this new component do 

-/(=') - 1 - 
■*-m,n — ■*■ 71 ,m — - l ? 

end for 
end if 

until No such kind of triple. 

return the connected components of each graph which define the non-uniform feasible solu¬ 
tion 


Algorithm 1 is a covariance thresholding algorithm that can identify a non-uniform 
feasible partition satisfying condition (8). We call Algorithm 1 hybrid screening al¬ 
gorithm as it utilizes both class-specific thresholding (e.g. \s[^j\ < Ai ) and global 

thresholding (e.g. Ai)^ < A 2 ) to identify a non-uniform partition. This 

hybrid screening algorithm can terminate rapidly on a typical Linux machine, tested on 
the synthetic data described in section 7 with AT = 10 and p = 10000. 

We can generate a uniform feasible partition using only the global thresholding 
and generate a non-uniform feasible partition by using only the class-specific thresh¬ 
olding, but such a partition is not as good as using the hybrid thresholding algorithm. 
Let ..., and Q denote the partitions gen¬ 

erated by hybrid, class-specific and global thresholding algorithms, respectively. It is 
obvious that ^ and ^ G for k = 1,2,..., K since condition (8) is a 
combination of both global thresholding and class-specific thresholding. 

Figure shows a toy example comparing the three screening methods using a 
dataset of two classes and three variables. In this example, the class-specific or the 
global thresholding alone cannot divide the variable set into disjoint groups, but their 
combination can do so. 

We have the following theorem regarding our hybrid thresholding algorithm, which 
will be proved in Supplemental File. 






Fig. 3. Comparison of three thresholding strategies. The dataset contains 2 slightly different 
classes and 3 variables. The two sample covariance matrices are shown on the top of the fig¬ 
ure. The parameters used are Ai = 0.04 and A 2 = 0.02. 


Theorem 4 The hybrid screening algorithm yields the finest non-uniform feasible par¬ 
tition satisfying condition (8). 


6 Hybrid ADMM (HADMM) 

In this section, we describe how to apply ADMM (Alternating Direction Method of 
Multipliers ||2] |5l) to solve joint graphical lasso based upon a non-uniform feasible 
partition of the variable set. According to HI, solving Eq.(l) by ADMM is equivalent 
to minimizing the following scaled augmented Lagrangian form: 

K K 

- ^ +U‘^’='>\\% + P{Y) (9) 


where Y = ..., and U = are dual vari¬ 

ables. We use the ADMM algorithm to solve Eq.(9) iteratively, which updates the three 
variables 0, Y and U alternatively. The most computational-insensitive step is to up¬ 
date 0 given Y and U, which requires eigen-decomposition of K matrices. We can do 
this based upon a non-uniform feasible partition ...,Eor each k, 

updating 0(^') given and for Eq (9) is equivalent to solving in total 
independent sub-problems. Eor each G its independent sub-problem solves 
the following equation: 
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Fig. 4. The objective function value with respect to the number of iterations on a six classes type 
C data with p = 1000, Ai = 0.0082 and A 2 = 0.0015. 


Solving Eq.(lO) requires eigen-decomposition of small submatrices, which shall be 
much faster than the eigen-decomposition of the original large matrices. Based upon 
our non-uniform partition, updating Y given 0 and U and updating U given Y and 
0 are also faster than the corresponding components of the plain ADMM algorithm 
described in ||3, since our non-uniform thresholding algorithm can detect many more 
zero entries before ADMM is applied. 

7 Experimental Results 

We tested our method, denoted as HADMM (i.e., hybrid covariance thresholding al¬ 
gorithm + ADMM), on both synthetic and real data and compared HADMM with two 
control methods: 1) GADMM: global covariance thresholding algorithm - 1 - ADMM; 
and 2) LADMM: class-specific covariance thresholding algorithm -i-ADMM. We im¬ 
plemented these methods with C-H- and R, and tested them on a Linux machine with 
Intel Xeon E5-2670 2.6GHz. 

To generate a dataset with K classes from Gaussian distribution, we first randomly 
generate K precision matrices and then use them to sample 5 x p data points for each 
class. To make sure that the randomly-generated precision matrices are positive definite, 
we set all the diagonal entries to 5.0, and an off-diagonal entry to either 0 or ±r x 5.0 . 
We generate three types of datasets as follows. 

- Type A: 97% of the entries in a precision matrix are 0. 










(a) Type A 





(b) Type B 



(c) Type C 




Fig. 5. Logarithm of the running time (in seconds) of HADMM, LADMM and GADMM for 
p = 1000 on Type A, Type B and Type C data. 


- TypeB: the K precision matrices have same diagonal block structure. 

- lypeC : the K precision matrices have slightly different diagonal block structures. 

For Type A, r is set to be less than 0.0061. For Type B and Type C, r is smaller 
than 0.0067. For each type we generate 18 datasets by setting K — 2,3,.. .,10, and 
p = 1000, 10000, respectively. 

7.1 Correctness of HADMM by Experimental Validation 

We first show that HADMM can converge to the same solution obtained by the plain 
ADMM (i.e., ADMM without any covariance thresholding) through experiments. 

To evaluate the correctness of our method HADMM, we compare the objective 
function value generated by HADMM to that by ADMM with respect to the number 
of iterations. We run the two methods for 500 iterations over the three types of data 
with p = 1000. As shown in Table [T] in the hrst 4 iterations, HADMM and ADMM 






























Table 1. Objective function values of HADMM and ADMM on the six classes type C data (first 
4 iterations, p = 1000, Ai = 0.0082, A 2 = 0.0015) 


Iteration 

1 

2 

3 

4 

ADMM 

HADMM 

1713.66 

1734.42 

-283.743 

-265.073 

-1191.94 

-1183.73 

-1722.53 

-1719.78 


yield slightly different objective function values. However, along with more iterations 
passed, both HADMM and ADMM converge to the same objective function value, as 
shown in Figure |^and Supplementary Figures S3-5. This experimental result confirms 
that our hybrid covariance thresholding algorithm is correct. We tested several pairs 
of hyper-parameters (Ai and A 2 ) in our experiment. Please refer to the supplementary 
material for model selection. Note that although in terms of the number of iterations 
HADMM and ADMM converge similarly, HADMM runs much faster than ADMM at 
each iteration, so HADMM converges in a much shorter time. 


7.2 Performance on Synthetic Data 

In previous section we have shown that our HADMM converges to the same solution as 
ADMM. Here we test the running times of HADMM, LADMM and GADMM needed 
to reach the following stop criteria forp = 1000; Yl\=i < 10“® and 

SiLi < 10“®. For p = 10000, considering the large amount of 

running time needed for LADMM and GADMM, we run only 50 iterations for all the 
three methods and then compare the average running time for a single iteration. 

We tested the running time of the three methods using different parameters Ai and 
A 2 over the three types of data. See supplementary material for model selection. We 
show the result for p = 1000 in Figure]^ and that for p = 10000 in Figure S15-23 in 
supplementary material, respectively. 

In Figure each row shows the experimental results on one type of data (Type 
A, Type B and Type C from top to bottom). Each column has the experimental re¬ 
sults for the same hyper-parameters (Ai = 0.009 and A 2 = 0.0005, Ai = 0.0086 
and A 2 = 0.001, and Ai = 0.0082 and A 2 = 0.0015 from left to right). As shown 
in Figure]^ HADMM is much more efficient than LADMM and GADMM. GADMM 
performs comparably to or better than LADMM when A 2 is large. The running time 
of LADMM increases as Ai decreases. Also, the running time of all the three methods 
increases along with the number of classes. However, GADMM is more sensitive to the 
number of classes than our HADMM. Moreover, as our hybrid covariance thresholding 
algorithm yields finer non-uniform feasible partitions, the precision matrices are more 
likely to be split into many more smaller submatrices. This means it is potentially easier 
to parallelize HADMM to obtain even more speedup. 

We also compare the three screening algorithms in terms of the estimated compu¬ 
tational complexity for matrix eigen-decomposition, a time-consuming subroutine used 
by the ADMM algorithms. Given a partition T-L of the variable set of V, the compu¬ 
tational complexity can be estimated by shown in Supplementary 









Fig. 6. Network of the first 100 genes of class one and class three for Setting 1. 


Figures S6-14, when p — 1000, our non-uniform thresholding algorithm generates par¬ 
titions with much smaller computational complexity, usually ^ of the other two 
methods. Note that in these figures the Y-axis is the logarithm of the estimated compu¬ 
tational complexity. When p — 10000, the advantage of our non-uniform thresholding 
algorithm over the other two are even larger, as shown in Figure S24-32 in Supplemental 
File. 

7.3 Performance on Real Gene Expression Data 

We test our proposed method on real gene expression data. We use a lung cancer data 
(accession number GDS2771 iflTl ) downloaded from Gene Expression Omnibus and 
a mouse immune dataset described in m. The immune dataset consists of 214 ob¬ 
servations. The lung cancer data is collected from 97 patients with lung cancer and 











































90 controls without lung cancer, so this lung cancer dataset consists of two different 
classes; patient and control. We treat the 214 observations from the immune dataset, the 
97 lung cancer observations and the 90 controls as three classes of a compound dataset 
for our joint inference task. These three classes share 10726 common genes, so this 
dataset has 10726 features and 3 classes. As the absolute value of entries of covariance 
matrix of first class (corresponds to immune observations) are relatively larger, so we 
divide each entry of this covariance matrix by 2 to make the three covariance matrices 
with similar magnitude before performing joint analysis using unique Ai and A 2 . 

The running time (hrst 10 iterations) of HADMM, LADMM and GADMM for this 
compound dataset under different settings are shown in Table and the resultant gene 
networks with different sparsity are shown in Figj^and Supplemental File. 

As shown in Table HADMM (ADMM + our hybrid screening algorithm) is al¬ 
ways more efficient than the other two methods in different settings. Typically, when Ai 
is small and A 2 is large (Setting 1), our method is much faster than LADMM. In con¬ 
trast, when A 2 is small and Ai is large enough (Setting 4 and Setting 5), our method 
is much faster than GADMM. What’s more, when both Ai and A 2 are with moderate 
values (Setting 2 and Setting 3), HADMM is still much faster than both GADMM and 
LADMM. 


Table 2. Running time (hours) of HADMM, LADMM and GADMM on real data. (Setting 1: 
Ai = 0.1 and A 2 = 0.5; Setting 2: Ai = 0.2 and A 2 = 0.2; Setting 3: Ai = 0.3 and A 2 = 0.1; 
Setting 4: Ai = 0.4 and A 2 = 0.05, and Setting 5: Ai = 0.5 and A 2 = 0.01) 


Method 

Setting 1 

Setting 2 

Setting 3 

Setting 4 

Setting 5 

HADMM 

3.46 

8.23 

3.9 

1.71 

1.11 

LADMM 

> 20 

> 20 

13.6 

3.72 

1.98 

GADMM 

4.2 

> 20 

> 20 

11.04 

6.93 


As shown in Fig|^ the two resultant networks are with very similar topology struc¬ 
ture. This is reasonable because we use large A 2 in Setting 1. Actually, the networks 
of all the three classes under Setting 1 share very similar topology structure. What’s 
more, the number of edges in the network does decrease signihcantly as Ai goes to 0.5, 
as shown in Supplementary material. 

8 Conclusion and Discussion 

This paper has presented a non-uniform or hybrid covariance thresholding algorithm to 
speed up solving group graphical lasso. We have established necessary and sufficient 
conditions for this thresholding algorithm. Theoretical analysis and experimental tests 
demonstrate the effectiveness of our algorithm. Although this paper focuses only on 
group graphical lasso, the proposed ideas and techniques may also be extended to fused 
graphical lasso. 

In the paper, we simply show how to combine our covariance thresholding algo¬ 
rithm with ADMM to solve group graphical lasso. In fact, our thresholding algorithm 








can be combined with other methods developed for (joint) graphical lasso such as the 
QUIC algorithm 0, the proximal gradient method HD, and even the quadratic method 
developed for fused graphical lasso EOl . 

The thresholding algorithm presented in this paper is static in the sense that it is 
applied as a pre-processing step before ADMM is applied to solve group graphical 
lasso. We can extend this “static” thresholding algorithm to a “dynamic” version. For 
example, we can identify zero entries in the precision matrix of a specific class based 
upon intermediate estimation of the precision matrices of the other classes. By doing so, 
we shall be able to obtain hner feasible partitions and further improve the computational 
efficiency. 
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